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1 Introduction 


Given two probability measures, there are several ways to define their dis¬ 
tance. This is, e.g.^ important in problems where a sequence of measures 
converges and the nature of this convergence has to be dealt with in a quan¬ 
titative way. Common examples are the relative entropy and the total vari¬ 
ation distance. 

Here we shall focus on the Hellinger distance. As we shall only consider 
finite event spaces, say = N with N ^ Nq, probability measures /x on 
G are A^-tuples (/ii,/i 2 , • • • ,hv) which satisfy /Xq, > 0 and = 1- The 

Hellinger distance is: 



0 = 1 


sometimes, the factor | is left out. The Hellinger distance is a real number 
between 0 and 1. A related notion is the ajfinity between two probability 
measures, defined as 


N 

A^2) := 1 - dnitJ'i, M 2 ) = = (Ml^^ M2^^)- 

0=1 

In the last term, we have used the short-handed notation for the N- 
dimensional vector /X 2 '^^,... ,/Xiv^)- Two probability measures have 

affinity one only when they are equal. Two different degenerate probabil¬ 
ity measures have affinity zero. 

Given several measures /Xj, i = 1,... , A', one can ask for a generalisation of 
the notion of affinity. The problem is to find a way of measuring how many 
of those measures are close to each other. Here we propose to use the concept 
of Gram matrix 


G is positive semi-definite and its spectrum is independent of the order of 
the /Xj’s. 

A lot of information about the mutual affinities of the probability measures 
is encoded in the spectrum of G. To appreciate this fact, let us for a moment 
consider degenerate probability measures. The affinity between any two of 
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these can only be one or zero. In the case all K probability measures are 
equal, all entries of G are equal to 1 and, therefore, its eigenvalues are K and 
0 with respective multiplicities 1 and K — 1. The other extreme situation 
is K different degenerate probability measures, in which case G is the K- 
dimensional identity matrix with eigenvalue 1 occurring K times. For an 
arbitrary set of degenerate measures, i.e. for a set of symbols in hi, the 
spectrum of G determines the relative frequencies of the different symbols 
appearing in the set. 

Of course, allowing general probability measures, any positive number can be 
an eigenvalue of G, but the general picture remains and can be described as 
follows. An eigenvalue distribution which puts a lot of weight on eigenvalues 
close to zero indicates that a large group of probability measures are close 
to each other (have large mutual affinities). If, on the other hand, a sizeable 
portion of the eigenvalues occur relatively far away from zero, the probability 
measures have in general low mutual affinities. 

Here we shall study the Gram matrix for independently and randomly chosen 
probability measures with respect to the uniform distribution on the simplex 
Aat = {fi = (yUi,... ,/iAr) I = 1 ^^icl /Xq > 0}. The Gram matrix 

and its spectrum are now random objects. We want to study these objects 
when both the number of measures and the cardinality of the event space 
become large. More specifically, we study the spectrum of the random Gram 
matrix in the limit N = cx), the number of measures K{N) cx) and 

K{N)/N —r where r is a given positive number. We shall explicitly calcu¬ 
late the limiting expectation value of the empirical eigenvalue distribution 

1 ^ 

Pk{x) := — Ai), 

i=l 

where Ai,... ,Xk are the (random) eigenvalues of the Gram matrix. We 
shall, moreover, prove that the convergence occurs with probability 1. 

The setting of this problem is similar to that of the Wishart matrices: let A be 
a real random N x K matrix with A^(0,1) i.i.d. entries, let K = tN for r > 0 
and consider the limit N ^ oo. It is known that the empirical eigenvalue 
distribution of the random matrix A*A/K converges to the distribution 

{ 5{x — 1) if r = 0 

a(a;,r) if 0 < r < 1 ( 1 ) 

- 5{x) + a{x, t) if r > 1 
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with 


AtX — {x -\r T — ly 


{l-^f<X<{l + 

otherwise. 


a{x,T) = 



271TX 


This distribution is known as the Marchenko-Pastur distribution and we 
shall obtain it in Theorem [^. In |I[], the same distribution arose in the context 
of Gram matrices associated to random vectors. 


The paper consists of two more parts. In Section ^ we discuss some general 
features of the spectrum of the random Gram matrices and calculate the lim¬ 
iting expectation of the empirical eigenvalue distribution using the Stieltjes 
transform. The main theorem in this section is Theorem 2. In Section |^, 
we prove that the convergence of the empirical eigenvalue distribution occurs 
almost surely. This is the contents of Theorem 


2 Convergence in expectation 

Denote by An the simplex {/x = (pi,... , /xtv) € | > 0 and Aa = 

1}. It is the space of probability measures on an event space D with N ele¬ 
ments. On this space, a uniform measure a can be put in the sense that 

[ /(/x)dcr(/x) = ^ f f{Afj,)da{fi), 

Jan |det(A)|JAjv 

for every integrable function / on An supported in AAn and for every in¬ 
vertible stochastic matrix A. [A is stochastic if A^^ > 0 and = !)• 

This uniform measure is just the Lebesgue measure on Aat. We can also 
obtain this measure in terms of the larger space of which Aat is a 

subset. If we choose N independent random variables x,, all distributed 
according to the exponential distribution with some hxed mean, then /x := 
(xi,... ,XAr)/(xi H— ■ + X 7 v) is uniformly distributed on An, a fact which is, 
e.g., proven in 0. 

Now we choose K measures /x^ G An, independently and uniformly dis¬ 
tributed, and associate with them the Gram matrix G\ 
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We shall study the spectrum of G in the limit K, N oo, keeping the ratio 
K/N =: T hxed. The Gram matrix is of course a random object but its 
spectrum has typical properties. The hrst characteristic of the spectrum is 
the presence of one eigenvalue much larger than the others. This eigenvalue 
is the norm of G as G is positive definite and it grows, as we shall show, 
linearly with N. The remaining eigenvalues are typically concentrated on an 
interval close to zero. In fact we prove: 

Theorem 1 The empirical eigenvalue distribution Pk{x) converges weakly in 
expectation to the Marchenko-Pastur distribution scaled with the factor 
a = 1 — jTi, i.e. 

E(pk{x)) -pMpf—) 

a \a/ 

We hrst prove some lemmas and comment on the (expectation of the) norm of 
the random Gram matrix. First, we need expectations of arbitrary moments 
of the components of random probability measures. 


Lemma 1 Let = (/xi,... , Pn) be a uniformly random probability measure 
from Atv and let ai, ... , oat > 0; then 




iN-l)\uLnc^^ + ^) 

r(Q;i + ■ • • + Qfjv T N) 


( 2 ) 


Proof: Using the representation of the uniform measure on Ajv in terms of 
the exponential distribution with mean 1, we write the expectation as 


EipT 


■k7) = 


dxi 


dxN 


Xi 


■ ■ - X 


OlN 

N 


(xi H-h a;Ar)"i+-+"^ 


The change of coordinates 

Xi 

Vi ■= —7--- 

Xi + • • • + Xjs! 

transforms the integral into 


, i = 1, . . . , - 1, y^ ■=Xn 


“1 r^-yi 


"l-j/i- yN-2 


dyi / dy 2 --- 


dyN-i I dyNyr---y7-iyN~' 


'0 JO 


(1-1/1-l/iv-l) 


CX]\[—N 


exp 


_^7_ 

1-2/1-2/7V-] 
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Integrating with respect to i/n yields 

/ dyNyN~^ exp[--- — -] = (1 - j/i- yN-i)^{N -1)\. 

Jo ^ 1-2/1- yN-i'' 

After this step, the successive calculation of the integrals over Un-i, ■ ■ ■ , 2/i 
can be completed using 

rdyy-(x - y)" = i'+«B(p+ !,()+ 1) = 

Jo V(j) + q + 2) 

with B the Beta Function. 


As a ftrst application of this lemma, we compute the expectation of a single 
entry in the Gram matrix 

^, 1,1 1 , 

“ 4 ^ ^ 128iV2 “ 512iV3 ^ ■ 


This means that in the iV —> cx) limit, every matrix element has a non-zero 
mean and, therefore, the norm of the Gram matrix will grow linearly with 
N] see e.g. P]. It turns out that an expression for E(||G||) can be given in 
terms of the i?-transform, a basic notion from free probability; see p, To 
state the result, we need some terminology. 

In non-commutative probability, a random variable is an element from a uni- 
tal algebra and expectation values are given by unital linear functionals <I> on 
this algebra. The moments of the random variable A are := 4)(A”) with 
n G N. Another sequence of numbers associated with a random variable are 
its free cumulants {kn)n£n- These are deftned in terms of non-crossing par¬ 
titions. A partition vr = {Vi,... , Vs} of the set {!,... ,n} is called crossing 
when there exist numbers 1 < Pi < qi < P 2 < (I 2 ^ n and 1 < i < J < s for 
which pi,P 2 G Vi and qi,q 2 ^ Vj- A partition in which no crossing occurs is 
called non-crossing. Denote by NC(?7,) the set of all non-crossing partitions 
on {!,... ,n}. The free cumulants are deftned recursively by the equations 

THn ^ ^ 

7r6NC(n) 
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with kn = k#Vi ■ ■ ■ k^Vs • • • > ^s}- For n = 1, 2, 3 the free cu- 

mulants are equal to the usual cumulants of probability theory, where no 
restriction on the partitions occurs. Only starting from A; 4 , there is a dif¬ 
ference due to the fact that at least 4 different indices are needed to have 
a crossing. E.g., for a centred A, which means that *h(A) = 0, we hnd 
— 3<h(A^)^ for the usual fourth cumulant, while — 2<h(A^)^. 

The relation between the {kn)n and {mn)n can be formulated elegantly using 
formal power series. The first one is the Cauchy transform 

OO 

= *(("--4)") 

n=0 

and the second the R-transform 


Ra{z) ■= kn+iz"-. 

n=0 


The relation between these two transforms is then given by Voiculescu’s 
formula 


C'a 


Ra{z) H— 


= 2 :. 


Lemma 2 Let tp be a normalised vector in a Hilbert space Sj, let X be a 
bounded, linear, self-adjoint operator on S) and let \(p){'p\ denote the operator 
ip H-> {ip,'ip)ip. The norm of 

A(e) := |(^)(<^| -|- eX, e G M 
is given by the asymptotic series 

OQ 

ll^(^) II ~ yy kn+i = 1 + e Rx{e), 

n=0 


i.e., for any Uq G N 

no 

ll^('^)ll = yy kn+i o(e"^°). 

n=0 

The {kn)n are the non-crossing cumulants of X with respect to the expectation 
^(■) := 


7 



Proof: For e sufficiently small, A{e) will have an eigenvalue coinciding with 
its norm. Let 'ijj{e) be the corresponding eigenvector, then 

(|V7)((y9| + eX)i:{e) = ||A(e)|| 

The vector 'ip{e) depends continuously on e and tends to (p when e —> 0. 
Moreover, lim^ ||^(e)|| = 1 |0- We can rewrite the eigenvalue equation as 

ij{e) = (||A(e)|| - eX)-^(p. 


Multiplying with ip and using {ip, 'ip) ^ 0 for sufficiently small e yields 

(v.,(||A(6)||-6X)-V) = 1. 

Then (0) gives 


(3) 


Cx ( Rx ^ = R'X ^ ^ 


f\\A{e)\\ 


which is valid for arbritrary small e and so 


ll^(<^)ll — 1 + ei?x(e). 


Using Lemma ^ we can compute the asymptotic series for E(||G||”). Set 
¥’ = '^-= • • • , and e = 4/iF7r; then 

eG = \p)){ip\ + eX with X = G - \ip){ip\. 

E.g., for EdlGII), we get 

Ktt 4 / 1 \ 

E (l|G||) = — + E ({ 1 , XI)) + — E (( 1 , XH) - ( 1 , Xl)2) + O j . 

Using (|^) and putting as before t = K/N 

E((1.X1), = 1- 1 + 4 + 0 (1). 

E((l,Xn))=A-r(4^)+0(l), 
E({l,A'l)'^)=o(i). 

8 


and 






We can repeat this procedure to get with arbitrary accuracy the expectation 
of any power of the norm of the Gram matrix. 

To study the asymptotic eigenvalue distribution, we could, as a hrst step, 
try to obtain the moments of the limiting distribution as limits of the ex¬ 
pectations of moments of px- The largest eigenvalue of G contributes with 
a weight 1/K in px, but, as this largest eigenvalue is essentially located 
around Ktt/A, its contribution to the expectation value of the nth moment 
of pk is of the order which leads to a divergence. We must therefore 

remove that contribution and study the expectations of the moments of the 
non-normalised distribution 



(4) 


In the limit, the weight of the largest eigenvalue will become negligible and 
we recover a normalised distribution. It turns out that, in principle, the 
moments of the limiting distribution can be obtained by calculating the limit 
of the expectations of the moments of using (^. The hrst two moments 
yield 



These moments coincide of course with those from Theorem |l|. The com¬ 
putation is however very hard, as the nth moment of p'j^ requires n terms 
in the series expansion of E (||G||). A quite complicated combinatorial argu¬ 
ment is already required just to cancel the orders of N larger than one in the 
traces of G. A much more convenient function of the spectrum of the Gram 
matrices is the normalised trace of its resolvent. The following proof bears 
some resemblance to the approach presented in P], but is technically rather 
different. 
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Proof of Theorem |^: Denote by (t{G) the spectrum of G and define for 
2 eC\a{G) 


Gk(z) := ^Tr 


K G-z 


X — z 


-dpK{x). 


The last equality shows that Gk is the Stieltjes transform of the empirical 
eigenvalue distribution. 

Let be the standard orthonormal basis of and z G C \ ct(G); 

then 

i=i 


Now, for every j in the sum, we peel off the jth row and column: 


This means that we write as ^ © Ce^. The corresponding form for 
the resolvent is: 

G^O—z 1—2—oO) 1—2— 


G-Z 


((/?0),(G(^')-2)-i ■) 
1 — 2 — 0 : 0 ) 


1 

1—2—qO) 


with 


Q/b) ;= ((pb)^ (Gb) _ ^(pb)^_ 


(5) 


Note that in obO the vectors are the only place where random variables 
of the jth measure occur. The Stieltjes transform can then be written as 


Gk(z) = 


K 


K ^ 1 — z — ad) 


( 6 ) 


We shall now take the limit of the expectation value of (P). Therefore, we 
fix a compact A C C \ M’*' and z E A. We first calculate Ej (^ad)'j , where the 
subscript j means that only the random variables appearing in the jth vector 
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will be averaged out. Let X = — z) and use E = ir/AN 

with a ^ (3 and E (/i„) = 1/N. 



= i(l - j)lVG>«X + ^{yJ),X7“), 


with := ( j . In Lemma |], we prove that the expectation (now 

V / k^j 

averaging over all random variables) of the second term converges to 7r/4, 
uniformly on A. Setting 

MA ■■= E aud ff(z) := E . 


we get that 

(““) = V (' - l) " G(i^) + I + 

= (l - ^) (^ + + ^ + z), 

with Q{N, z) converging to zero, uniformly on A. In Lemma we show that 

Ej = (Ej + R{N,z), 

where E {R{N, z)) (averaging over all remaining random variables) converges 
to zero, uniformly on A. This allows us to write 


E 


1 — z — ad) 


E(aL)) 


< E 


|E 


a 


Dili 


E(aL)) 


|3(z)P 


^E (^(ad) -E(aL)))^), 


^nR{N,z)) 


which goes to zero, uniformly on A. We get 

/.(a) = E(C.(a)) = 1 f E ^ t + ° 

i=i i=i 


1 

l-a-(l-f)(T + ^r/®(a)) 
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Consider for a fixed z G A the sequence fi{z),f 2 {z),.... This sequence 
of complex numbers lies in a compact set, so it must have a convergent 
subsequence. Moreover, every convergent subsequence has the same limit 
because there is only one number f{z) that satishes both the equation 


a — z — ar — azTf{z) ’ 

with a = 1 — ^TT and the condition Q{z)‘A{f{z)) > 0. From this it is imme¬ 
diately clear that 

lim fniz) = -/mp(-), 
n->oo a a 

where /mp(^) is the Stieltjes transform of the Marchenko-Pastur distribu¬ 
tion. Because the convergence in expectation of fxi^) to f{z) is uniform on 
compact subsets of it follows that Pk{x) converges in expectation to 

Pup^x). 


In the proof of Theorem Q we used Lemmas and The idea behind their 
proofs is the following. Each of the entries in the random Gram matrices 
has approximately the same value. The eigenvector belonging to the largest 
eigenvalue, i.e. the norm, of such a matrix, has also nearly constant entries. 
Vectors like 7 *'-^^ defined in Lemma |] are of this kind. This means that an 
expression like f jg approximately equal to /(||G'(-^)||) || 7 *'-^^|p. 

In the sequel, we shall drop the superscript (j) in 7 and in G as well, moreover, 
we shall replace K — Ihy K wherever it is not relevant for the result, e.g., 
wherever we need quantities estimated up to order 1 in K. 

Lemma 3 Let y = Lka') ’ 

uniformly on compact subsets 0/ C \ M"*". 

Proof: We start with the calculation of some useful expectations, all of 
them just applications of (H). First, 

E (hit) = + (1 - j) T/v. 
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Setting 


implies E(||r 7 |p) = 1 and 

E {{v, Gr])) = + ^(4 - 7r)(l + r) + O (7) 

E {{v, G^r])) = + r) + O (1). (8) 

Next, we use the spectral theorem for selfadjoint matrices 

G = [ XdE{X). 


Set Ao := E((? 7 , Gr/)); the spectral measure d||i7(A)r/|p is, in expectation, 
very much concentrated around Aq: 


eT r(A-A„)=d||i5(A)r,||A =E((,,,G'=,,» -E((r,,Gr,)f 

=:C = 0(1), 


by and (H). A consequence, using Tchebyshev’s inequality, is 

[ d||B(A),,f <4E(r(A-A„fd||i5(A),,f) (9) 

J|A-Ao|>Ao/2 ^0 \J0 / ^0 

Now, we are able to prove the lemma. Consider a compact subset A C 
and choose z G A; then 


E 




< 



1 E(||7f) 

NE{{r],Gr])) - z 


1 E(||7|P) 

NE{{r],Gr])) - z 


( 10 ) 


The second term is equal to 


X{4 - 7r){l + t) + z 

XNttt — z + O (1) 
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which goes to zero uniformly on A. The hrst term of (pT]) gives 

1 1 


1 

N 


E (hit) E i (r,, ( 
1 ^ 


= —E 
N 


E 


G — z E ((r;, Gt])) — z 

oo / 1 ^ 


'0 


<^E(||7f)E 


A — z Xq — z 
|Ao-A| 


V 




^E(||7||2)E 


|A - 2 r||Ao - z\ 
|Ao-A| 


l\o/2 |A-z||Ao-^ 
The first integral is bounded from above by 


d\\E{X)vr 
d\\E{X)vr 


Xr 


r‘Ao/2 


E 


|3(2;)||Ao-2;| \Jo 


d\\E{X)vr < 


4G 


Ao|9(2;)||Ao - z\ 


The second integral in (^) is, provided 3^(2:) < Ao/2, bounded by 
1 1 


■E 


|Ao/2 —^1 |Ao —2;| vAo/2 
1 1 


< 


|Ao/2 — z\ |Ao — ^ 


E 


|Ao-A|d||E(A)7||- 


{Xo-Xfd\\E{XM- 


( 


< 


|Ao/2 — z\ |Ao — ^1 




\ 


E / (A„-A)U||£(Ah||UE(|hf) 


V 


=c 


=1 


/ 


We conclude that (|10|) can be bounded from above by 


1 /vrr 


N \ 4 


— —N^ + ( 1 --]tN 


71 


4G 


+ 




Ao|9(2;)||Ao - 2^1 |Ao/2 - z||Ao - 2:| 


( 11 ) 


which gives a uniform bound on A, going to zero when N ^ oo. 
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Lemma 4 With the notations introduced in the proof of Theorem [J 
Ej = {Ej + R{N,z), 

where E {R{N, z)) converges to zero, uniformly on compact subsets o/C\M^. 


Proof: Using the notation introduced in and still denoting 1/{GRI — z) 
by X, we compute the expectation with respect to the random variables 
appearing in the jth random probability measure by multiple applications 
of (H). We get 




\k,l,7n,n q;,^,7,(5 

k^l,m,n \a,/3,7,(5 ^ 

''' 1 / 2 ^ 1/2 1/2^, I/O TT 


9\^ 1/2-^ 1/2 1/2-^ 1/2 
^ j t^ka _|_ 1 1 

a,/3,7 

E ' 1/2 y 1/2 1 / 2 ^ 1/2 ^ 

Tka ^klTla TmP^rnnTnfS 


a,g 


1/2 


iV(iV + 1) 
1 


2 Tka ^klTljs h'mP^mnTna 

a.,0 

4 E' Wxmi/XIS.x.WS 

a.,0 

^ ^ Tka ^klTla h'ma^^i^h'na _j_ ’ 


iV(iV + 1) 
Svr 

siv^vTi) 

2 


where the symbol / means the sum over all r-tuples (oi,... , o^) 

' ^ ai,... ,ar 

in which no two entries are equal. Denote the seven restricted sums in this 
expression X[,... , Xy. Rewriting the expression in terms of the unrestricted 
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sums, which we shall denote by Xi,... , X 7 , we get 


N{N + 1) [ 4 


TT 


Xi 


TT 7r^\ / TT^ 

, X 2 + \ TT — 

2 8 ' 


71 71 


X, + 1 - - - 


16 


X 4 


TT 


2 — TT + 


This can be written as 


^ Svr TT^ 


X 5 + - 


+ 


Stt 


X,+ -1 + — - 


Xv 


1 


X(X + 1) 


TT 


TrG(^')X + ^(7(^’),X7(^')) 


+ (vr - ^) (7(^),XG(^')X7(^)) + (^2 - TT + Tr G^^'^XG^^'^X 


Svr TT^ 

2~ Y 




/c,/,m,n a,/? 


3y 

— 1 -f- - — - 

2 8 




1 /^' 


1/2 


k,l,m,n O' 


From this, the first statement of the lemma is clear. Now it has to be proven 
that the expectation of the remaining terms tends to zero. 


The first term of R{N, z) is 

^ nWTT) ((^ - iT + ^(7«> aY'Y) . 

Now 

E((TrG«A)“)<^(A-l)^ 

while also E X 7 *^'^))^) is of order X^. We shall show this using the 

methods of Lemma |^. Again we omit the superscript (j), since this does not 
change the result in an essential way. We have 

E (II 7 (g) 7||2) = ^ “ 0 r^vrXl 

Set 

1 

V ■= -r7- 

E(||707||2)4 
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This definition ensures that E(||? 7 ®? 7 |P) = 1. (Note that this definition 
differs slightly from the one given in the previous lemma. The previous 
definition would have given 1 + 0 for the expectation of the square of 

the norm of 7 0 7 .) Again using (|^), we have 

E{{ri^ri,G®lr]®r])) = —N + (^1 - -j (1 + r) + O f — J 

2 2 

TT i TT'T / TT \ 

E ((7 0 7 , (g) I 7 (g) 7])) = ^ ~ 4 ) + O (1) • 

Denote by E{Xi, A 2 ) the joint spectral family of the commuting operators 
G g I and 11 g G and put Aq := E ((77 g 7 , G g I 7 g 7 )). We then have 


E(y(Ai-A„)M||£:(A,,A2)r,0r,||A =:G' = 0(1) i = l,2 

EGd||B(A„A2)7,®7,||A < ^ 
with A := {(Ai, A 2 ) I (Ai — Ao)^ + (A 2 — Aq)^ < Ag/4}. We write 
E ({ 7 , Xjf) = E (II 7 ® 7f) E ^7 ® 7 , ® ’'0 

= E (||7 ® 7|1") E (^y d||B(Ai, A2)r, ® ) . 

Then 


E 


IA (-^1 ~ z){X 2 — z) 


d||E(Ai, X2)r] g r]\\‘ 


< 


|Ao/2 — z| |Ao/2 — z\ 


and 


E 


d||E(Ai,A2)r7g 


< 


8 G' 


|A(z)PA§ 


2 • 


/ac (-^1 ~ ^)(-^2 ~ z ) 

As Ao is of the order these last two inequalities show that the first term 
of R{N, z) goes uniformly to zero on compact subsets of C \ R’*'. 


The second term of R{N,z) contains the matrix element . 

Again in the notation of Lemma ^ this gives 


|E(( 7 ,XGA 7 ))|=E 


E 


<E(||7f)E(y 

+ E(||7lt)E 


A 


(A-z)2 

Ao/2 1^1 


|A-;s|2 

- |A| 

|A-z|2 


d\\E{X)vr 
d\\E{X)vr 
d\\E{X)vr 


'Ao/2 
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The first term can be bounded from above by E(|| 7 ||^) (Ao/|9(2;)P) (dC/Ag) 
by an application of formula (P). This gives a bound of order O (iV). The 
second term has, provided that < Ao/2, a bound 


E(||7lP 


■E 


\Xo/2-z\^ VAo /2 
E(||7f 


|A|d||E(A)7|p 


< 


< 


|Ao/2-^|2 

E(||7f) 
|Ao/2 — z\^ 


E 


coo \ 1/2 

i2 111 z7'MN„I|2 


'Ao/2 


yd\\E{XM‘ 


1/2 


(E((7,G27)))'/Me(||7|P) 


=1 


which is also, as a consequence of (|^ and (|), of order O (N). 
The third term of R{N, z) is also of order O {N) by 


TrG—^ 


G-z G-z 


, 2z z^ 

Tr I + --+ 


G-z (G-z) 


<K{1 + 


2UI 


\Xs(z) 


+ 




The fifth term admits the following estimate. Writing for the vector 
> we have 


e( E E '‘‘o 

k,l,m,n a 


mnl^na 


E 

a 

E 


E ( (^Q, ® 
E 


G-z G-z 
0 Jo Xi- z X 2 -Z 


POO /*oo 


Ca ® ^a) 


di|E(Ai,A2)ea®eo 




r2 

'l+l/iV 


( 12 ) 
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The fourth term is estimated as 


k,l,m,n ol^(3 


< 


4^ 

\ a 




k,l 


EE ^ma^rnn^nfi 


m,n j3 


< 




\k,4m,n O' 


1/2 


®( E E 

,k,4m,n q :,/ 3,7 


1/2 

n'y 


1/2 


The first factor can be treated like ([I^), while the second factor is just 


Hence, all terms contributing to R{N, z) are of O {N) divided by N{N + 1). 
Therefore, the bound on R{N, z) tends to zero for large dimensions. 


3 Almost sure convergence 


In fact we can prove a stronger result. The empirical eigenvalue distributions 
are random measures. The randomness is described by the reference probabil¬ 
ity space X (R+, e~^dx) through the realization = {xji ,... , {xji + 

j,aGN 

■ ■ ■ + Xjn)- We shall denote by P expectations with respect to this reference 
probability space. 

Theorem 2 The convergence in Theorem [| occurs with probability 1. 

Proof: We essentially follow the proof in [Q for the almost sure convergence 
of the empirical eigenvalue distribution of the complex Wishart matrices, but 
use a different concentration-of-measure inequality. We need to show that 

P f - lim ^ Tr f^Gx) = - f f{x) pup (-) dx^ = 1 
\^N,K^oo K Ci Jo \a/ J 
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with a = 1 — jK and / an arbitrary continnons fnnction on vanishing at 
inhnity. We can fnrther restrict onrselves to a dense snbset of snch fnnctions, 
namely, we take for / a differentiable fnnction on with compact snpport. 
Dehne the fnnction g by setting g{x) := f{x^) for x G M’*'. Then g is also 
differentiable with compact snpport and, like /, a Lipschitz fnnction with 
constant 


Cl = snp \g\x)\. 
xeK+ 


Let ^ <T = denote two sets of K probability measures in 

Ajv- Dehne the N x K matrix by 



^ V All 



^/JF2 ■ 




• J 


and analogously for A^r- Then A*^A^ is the Gram matrix associated with the 
set of measures Dehne F : A^v x ■ ■ ■ x A^r —>■ M by 

F{p.) := f(AlA^). 

We want to show that the function F satishes a Lipschitz condition. Dehne 
by 






and Act 


to 

0 J' 


Now Lemma 3.5 in is used to transport the Lipschitz property of g on M’*' 
to MAr+x(C)sa, the set of (A^ +iL)-dimensional complex selfadjoint matrices. 
This lemma implies 


\\ 9 { A ^) — 5((Ao-)||hs < Cl \\ A ^ J , — Ao-IIhs, 


(13) 


where i|A||Hs := VTr A*A. Because 


(f{A*^A^) 0 \ 

0 f{A^A*^)) 


and an analogous expression for g{Aa). Now (P!B|) implies 




iL 


f{A*^A^)\\i^F\\f{A^A*^)-f{A^A: 

<cI{\\a^-aaI, + \\a; 


a: 


l|2 1 
IIhsI • 
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Since — Ao-Uhs = — ^^IIhs and because of the Cauchy-Schwarz 

inequality, we have 


|F(m) - F{a)\ < ^\\f{A*^A^) - f{AlA^ 


HS 


< Cl 




A 




Now 


K 


K N 


K N 


\\A^JL — ^<t||hs “ lIvTh “ \/^lP — EE I 'yjl^ioL \/^ioL I ' < 5^ l/^ia - CT*, 


j=l 

K 


i=l a=l 


i=l a=l 


< 


E^ 


i = l \ Q=l 


N 


K 


y^Xl^ia - WfJ-i - (Till, 


i=l 


with the notation yZ/I” = ■ ■ ■ , ^/Jmn)- Now for arbitrary t > 0 we 

have, using this Lipschitz condition. 


K 


P(|F(m) - F(a)\ X) = P (IFCm) - F(a)\^ > F) < P ( cf-vWJ^ ||m. - ^.|| > t 


2 = 1 


K 


IlMi - ^i\\ > 


, 2=1 


Ft^/N 
2cj 


K 


K 




, 2=1 


2 = 1 


Ft\/N 
2cf 

(14) 


Using Lemmas ^ and |^, we know that there exist constants T > 0 and C 2 > 0 
such that for t > 2T / Vn 


K 


P >f <iLexp 


C2tN 


. 2 = 1 


From this, it follows that if we choose N > 8TcI/tF, the probability 
can now be treated analogously as in the proof of Lemma 0 yielding: 


m I -rm I t^T\fN 


. 2 = 1 


2=1 


< K exp — I 


(UZEa,-) + K^- exp - f 

\2 2c\ ) ^ V 2 2c? ) 


t^T\fN . 


2Ar2 fC2t^TN^F 

<2r=iV^exp-(^^ 
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Then 


P (|F(^) - E(F)| >t)=¥ (exp [X^\F{tx) - E(F)p] > exp(A^t^)) 

^ E(exp [A2 |F(/x)-E(F)|^]) 

Take now N > 32Tcf/rt^. The function t ^ exp [X^{F{n) - tf] is convex, 
so Jensen’s inequality implies 


E^ (exp [A^ \F{^^) - E^ (F(a))|2]) < E^,^ (exp [X^\F{fx) - F{a)\^]) 


2X^Ce^ ^>(|F(/x) -F(cr)| > C) dC 


1-112 


< 


2X^Ce^^^^dC + 2t‘^N^ 


/i/2 ^ 


4c2 


dC'. 


If we choose A^ = C 2 riV^/^/ 8 c^, we get 


2r2iV2 + exp£^^ Sc^rN^H^ 

P m^^) - HF) I > t) <-< 2 exp . ^ 


exp 




32cf 


An application of the Borel-Cantelli lemma shows that this implies 


P( hm |F(/x)-E(F)|<t) =1, 

\N^oo / 

for arbitrary t. This completes the proof. 


Lemma 5 There exist absolute constants T > 0 and c > 0 such that for N 
independent exponentially distributed random variables Xi,... and all 
t > T/x/N holds that, with X = (Xi,... , X^v) and S = 


P 




< e 


-ctN 


Proof: See Theorem 3 and Lemma 1 in 0] . 









Lemma 6 Suppose that for a random variable X > 0 there exist constants 
c > 0 and T > 0 such that for all t > T 

P (X > t) < e-"*; 

then, for N identical independent copies Xi,... , Xat of X and for t > 2T 

P 


Proof: We prove this by induction on N. The statement is obviously true 
for X = 1 because t > 2T > T, so P (Xi > t) < e“^* < Suppose now 

that the statement is true for X — 1 copies, then for t > 2T 



N-l 




Xn > t — Xi 


i=l 

N-l 


= E 




Xjv > t - ^ Xi ) 
i=l 
Af-1 


< + (X - l)e-"‘/^ = Xe-"‘/l 
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